High - Dimensional Econometrics and Model Selection
نویسندگان
چکیده
This dissertation consists of three chapters. Chapter 1 proposes a new method to solve the many moment problem: in Generalized Method of Moments (GMM), when the number of moment conditions is comparable to or larger than the sample size, the traditional methods lead to biased estimators. We propose a LASSO based selection procedure in order to choose the informative moments and then, using the selected moments, conduct optimal GMM. My method can significantly reduce the bias of the optimal GMM estimator while retaining most of the information in the full set of moments. We establish theoretical asymptotics of the LASSO and post-LASSO estimators. The formulation of LASSO is a convex optimization problem and thus the computational cost is low compared to all existing alternative moment selection procedures. We propose penalty terms using data-driven methods, of which the calculation is carried out by a non-trivial adaptive algorithm. In Chapter 2, we consider partially identified models with many inequalities. Under such circumstances, existing inference procedures may break down asymptotically and are computationally difficult to conduct. We first propose a combinatorial method to select the informative inequalities in the Core Determining Class problem, in which a large set of linear inequalities are generated from a bipartite graph. Our method selects the set of irredudant inequalities and outperforms all existing methods in shrinking the number of inequalities and computational speed. We further consider a more general problem with many linear inequalities. We propose an inequality selection method similar to the Dantzig selector. We establish theoretical results of such a selection method under our sparsity assumptions. Chapter 3 proposes an innovative way of reporting results in empirical analysis of economic data. Instead of reporting the Average Partial Effect, we propose to report multiple effects sorted in increasing order, as an alternative and more complete summary measure of the heterogeneity in the model. We established asymptotics and inference 3 for such a procedure via functional delta method. Numerical examples and an empirical application to female labor supply using data from the 1980 U.S. Census illustrate the performance of our methods in finite samples. Thesis Supervisor: Victor Chernozhukov Title: Professor of Economics Thesis Supervisor: Jerry Hausman Title: Professor of Economics
منابع مشابه
Frontiers in Time Series and Financial Econometrics: An Overview
Two of the fastest growing frontiers in econometrics and quantitative finance are time series and financial econometrics. Significant theoretical contributions to financial econometrics have been made by experts in statistics, econometrics, mathematics, and time series analysis. The purpose of this special issue of the journal on “Frontiers in Time Series and Financial Econometrics” is to highl...
متن کاملFeature Selection for Small Sample Sets with High Dimensional Data Using Heuristic Hybrid Approach
Feature selection can significantly be decisive when analyzing high dimensional data, especially with a small number of samples. Feature extraction methods do not have decent performance in these conditions. With small sample sets and high dimensional data, exploring a large search space and learning from insufficient samples becomes extremely hard. As a result, neural networks and clustering a...
متن کاملComparison of Ordinal Response Modeling Methods like Decision Trees, Ordinal Forest and L1 Penalized Continuation Ratio Regression in High Dimensional Data
Background: Response variables in most medical and health-related research have an ordinal nature. Conventional modeling methods assume predictor variables to be independent, and consider a large number of samples (n) compared to the number of covariates (p). Therefore, it is not possible to use conventional models for high dimensional genetic data in which p > n. The present study compared th...
متن کاملRobust high-dimensional semiparametric regression using optimized differencing method applied to the vitamin B2 production data
Background and purpose: By evolving science, knowledge, and technology, we deal with high-dimensional data in which the number of predictors may considerably exceed the sample size. The main problems with high-dimensional data are the estimation of the coefficients and interpretation. For high-dimension problems, classical methods are not reliable because of a large number of predictor variable...
متن کاملInference for high-dimensional sparse econometric models
This article is about estimation and inference methods for high dimensional sparse (HDS) regression models in econometrics. High dimensional sparse models arise in situations where many regressors (or series terms) are available and the regression function is wellapproximated by a parsimonious, yet unknown set of regressors. The latter condition makes it possible to estimate the entire regressi...
متن کاملAn Improved Flower Pollination Algorithm with AdaBoost Algorithm for Feature Selection in Text Documents Classification
In recent years, production of text documents has seen an exponential growth, which is the reason why their proper classification seems necessary for better access. One of the main problems of classifying text documents is working in high-dimensional feature space. Feature Selection (FS) is one of the ways to reduce the number of text attributes. So, working with a great bulk of the feature spa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015